Goto

Collaborating Authors

 sub-gaussian distribution




2a91de02871011d0090e662ffd6f2328-Supplemental-Conference.pdf

Neural Information Processing Systems

The structure of the appendix mainly follows the roadmap of the proof described in Section 4.4. In Appendix A, we define the characterizable population risk function in (31) to approximate the objective function. Also, some notations to simplify the analysis are introduced in Appendix A, and we recommend the readers to refer to Table 3 for the major notations used in the proofs. Instead, in this paper, we consider multi-layer cases and need to derive a lower bound for the Hessian matrix for all the layers. Instead, the input of the intermediate layer cannot be proved to be Gaussian but belong to sub-Gaussian distribution.